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Introduction 


Through the analysis of small self-cleaving RNAs originally seen in 
tobacco ringspot viral DNA, a consensus sequence known as the hammerhead 
tibozyme was determined. Although the natural behavior of these molecules is 
cis—cleavage (i.e. substrate and catalytic domain in one molecule), it has 
subsequently been shown that the catalytic domain of a hammerhead ribozyme 
can be used in trans—cleavage against a separate RNA substrate. By introducing 
flanking sequences around the hammerhead ribozyme, cleavage specificity can 
be increased (6). Ribozymes can be introduced into a cell either by transfection 
or exogenous delivery and, if properly targeted, may be used to decrease 
endogenous or pathogenic expression of proteins by decreasing the number of 
RNA transcripts (4, 5). 

Hammerhead ribozymes can be useful in studying certain signal- 
transduction problems, where the involvement of potentially involved 
substrates and effector proteins needs to be tested. If the genetic sequence for a 
putative component is known, cDNA coding for the candidate protein can be 
transfected causing its overexpression. Changes in sensitivity of the signal- 
transduction can then be measured. Correspondingly, ribozymes can be 
designed to cleave endogenous RNA transcripts coding for the candidate 
protein. When such ribozymes are transfected, the candidate protein will be 
underexpressed. The strength of the signal-transduction can then be analyzed 
with the candidate protein present in diminished quantities. 

To test whether insulin receptor substrate-1 (IRS—1) is involved in the 
insulin-stimulated recruitment of glucose transporters, we have recently 
performed such an analysis. When insulin binds to its receptor in muscle and 


adipose cells, multiple effects take place including increased glycogen synthesis, 


activation of transcriptional factors, and recruitment of glucose transporters to 
the cell surface (8). Although mechanisms have been elucidated for insulin 
stimulated glycogen synthesis and transcriptional activation, the exact 
stimulatory mechanism for glucose transporter recruitment is still unknown. 
We started our analysis by testing components from the better described 
pathways to see whether they were involved in the insulin-stimulated 
recruitment of glucose transporters. 

After transfecting cDNA coding for human insulin receptor substrate-1 
(IRS-1) into rat adipose cells, we found those cells demonstrated increased 
sensitivity in insulin-stimulated recruitment of glucose transporters. To show 
that this was because of increased IRS-1 and not some other non-specific effect, 
we designed a cDNA coding for a hammerhead ribozyme targeted against 
endogenous rat IRS-1 mRNA. After the anti-rat IRS-1 ribozyme was transfected 
into rat adipose cells, the cells showed decreased sensitivity of insulin- 
stimulated glucose recruitment (1). 

We then performed the joint experiment of transfecting both the cDNA 
for human IRS-1 and the cDNA for the anti-rat IRS-1 ribozyme. This required 
that the ribozyme be designed to cut only rat IRS-1 transcripts and not human 
IRS-1 transcripts, a difficult task given that the two sequences are highly 
homologous. With such a ribozyme, we found that human IRS-1 is able to 
restore the loss of function caused by the anti-rat IRS-1 ribozyme (1). 

Designing a ribozyme to cleave a specific transcript can be laborious if the 
specific target sequence is similar to other transcripts which should not be 
cleaved. I have developed a program, entitled RIBOFIND, which automates the 
selection of target sequences so that a hammerhead ribozyme may be fashioned 


to cleave one mRNA but not another. 


Ribozyme Design 


The hammerhead consensus sequence was initially derived from tobacco 
ringspot virus, avocado sun blotch viroid, and the virusoid of lucerne transient 
streak virus (5). The consensus sequence is shown in Figure 1 alongside an RNA 
substrate. The ribozyme consists of three helical regions and an unpaired core. 
The target cleavage site often used is GUC, but GUA, GUG and GUU may 
sometimes be used depending on the other variable nucleotides in the ribozyme 
(S). 

Substrate specificity can be controlled by the addition of flanking regions 
surrounding the ribozyme. With these present, the ribozyme and flanking 
regions will preferentially pair with complementary regions surrounding the 


correct GUC cleavage site, as demonstrated in Figure 1 (6). 


Existing Computer Programs 


Denman describes a program called RNAFOLD which predicts the activity 
of small catalytic RNAs (2). Given a cleavage site and a ribozyme, RNAFOLD 
models the secondary structure of the cis-acting forms of the interaction using 
an energy minimization algorithm. The computed stability of the cis-acting 
hammerhead conformers was shown to correlate well with the activity of the 
trans-acting ribozymes measured experimentally. 

Muller, et al. developed an algorithm called RNASEARCH to rapidly 
generate displays of secondary structures of RNA, including those of catalytic 
RNAs (3). Aside from matching bases to find helices, this program does not 


further analyze the RNA sequences. 


Methods 


Programming 


RIBOFIND was written in the ANSI C language using the Metrowerks 
CodeWarrior DRS compiler on the Macintosh. The source code for RIBOFIND is 
provided in Figure 2 and can easily be ported to other hardware platforms. The 
program first asks the user for a target sequence and an avoidance sequence. The 
program then finds potential ribozyme cleavage sites in the target sequence that 
are as dissimilar as possible to the avoidance sequence. 

The algorithm works by first scanning the target sequence for all GUC 
sites. For each site, flanking sequences of up to ten nucleotides are found and 
the overall target site sequence (flanking sequences plus GUC) is concatenated. 
The program then scans the avoidance sequence for this target site sequence 
and counts matching base pairs. The program then reports the maximum 
number of matching base pairs found. RIBOFIND passes over GUC sites that are 
less than six base pairs from either end of the target sequence, as these may not 


provide enough of a flanking sequence for specificity. 


Execution 


The cDNA sequences for human and rat IRS-1 were retrieved from 
GENBANK through the Internet (human mRNA accession $62539, rat MRNA 
accession X58375). These are the same sequences we used in manually 
designing the ribozyme in our previous transfection experiments (1). RIBOFIND 
was executed with the two sequences as input. Overall execution time for each 
execution was 30 seconds on a 33 MHz 68LC040 processor (Macintosh 


PowerBook 540c, MacOS 7.5.1). 


Results 


RIBOFIND was first executed with rat IRS-1 as a target sequence and 
human IRS-1 as an avoidance sequence. RIBOFIND generates the list of the ten 
best potential target sequences shown in Table 1. RIBOFIND was then executed 
in reverse fashion, with human IRS-1 as a target sequence and rat IRS-1 as an 


avoidance sequence. Those results are shown in Table 2. 


Discussion 


Results 


I executed RIBOFIND searching for potential target sites in rat IRS~1 while 
avoiding sites that could be cut in human IRS-1. RIBOFIND found $8 GUC 
sequences in rat IRS-1. Each was made into a potential cleavage site by 
concatenating the GUC sequence with 10 flanking nucleotides on each side, 
making a 23 nucleotide site. These sites were then matched in human IRS-1 
with resulting scores between 60.9% (14/23 nucleotides similar) and 100% 
similarity (23/23 nucleotides similar). A list of the 10 least-similar cleavage sites 
was saved and printed. 

Interestingly, the target site we determined manually was on this list, but 
was ranked seventh, with a closest matching score of 69.6%. Five sites were 
found with scores lower than ours. 

1 then executed RIBOFIND using human IRS-1 as the target sequence and 
rat IRS-1 as the avoidance sequence. RIBOFIND found 62 GUC sequences, and 


these potential cleavage sites ranked between 60.9% and 100% similarity. 


Conclusions 


Executing RIBOFIND with an established problem of finding cleavage 
sites in rat IRS-1 while avoiding cutting human IRS-1 generated a list of ten 
useful sites to try. The site we found manually ranked only seventh on this list. 

The next logical step is to generate ribozymes for each of the top ten 
generated sites and to assay each for its cleavage efficiency in vitro against the 
substrates. Epitope-tagged substrates could be used to aid in the assay. Other 
future directions may include linking the output of RIBOFIND to other 
computer-assisted ribozyme design programs, such as RNAFOLD. 

I conclude that RIBOFIND is a useful program as a first step for finding 
potential cleavage sites in a target sequence, when one does not wish to cut 


other similar sequences. 
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Rank Cut Between Cut Site Closest 
Nucleotides Matching 
Score 


1 4055 and 4056 CAUCCAAGGA [GUC]* GGCUCCAGGG 60.9% 


2__| 236 and 237 GACUCCUCCG [GUC]* GUCUCUGCCU 635.2 
3 239 and 240 UCCUCCGGUC [GUC]* UCUGCCUCCC 65.2 
4 $241 and 5242 AUCCCUUCAA [GUC]* AAAAAUCUCU 65.2 


S) 5328 and $329 GACACUGUUG [GUC]* CCUCCCCACC 65.2 


6 2944 and 2945 CAGCAUCUUC {[GUC]* UCUCUUCAAG 69.6 


7 3053 and 3054 | CUAGGCCAGA [GUC]* UAGCGUCACA 69.6 


8 4310 and 4311 UUAACUGGAC [GUC]* ACAGGCAGAA 69.6 


9 3060 and 3061 AGAGUCUAGC [GUC]* ACACAUCCCC 73.9 


10 | 3355 and 3356 UCAGUUCGAU [GUC]* UACCCCAGCU 73.9 


Table 1 


Rank ordered list of ten potential ribozyme cleavage sites in rat IRS-1 that 
have least similarity to human IRS-1. The seventh item in the list (in bold) is 


the cleavage site manually determined and shown to be useful (1). 


Rank Cut Between Cut Site Closest 


Nucleotides Matching 
Score 
1 197 and 198 UCUCGCCCUU [GUC]* CCCUCCCCUC 60.9% 
2 244 and 245 GCAGGGAUGA [GUC]* UGUCCCUCCG 60.9 
3 248 and 249 GGAUGAGUCU [GUC]* CCUCCGGCCG 60.9 


4 5075 and 5076 CCACCACCGU [GUC]* AUGAGAGAAU 60.9 


Ss 13 and 14 CGGCGGCGCG {GUC]* GGAGGGGGCC 65.2 
T 

"6 420 and 421 CAGCGCCGCG [GUC}* UCUGCGACUG 65.2 

7 654 and 655 ACCCCCGACU [GUC]* GCCUCCCUGU 65.2 


8 1835 and 1836 GCAAGAGCCA [GUC]* CUCGUCCAAC 65.2 


9 2530 and 2531 UUGGGCACGA [GUC]* CAGCCUUGGC 65.2 


10__| 818 and 819 GAAAACUCCG [GUC]* GGGCUCUCUC 69.6 


Le 


Table 2 


Rank ordered list of ten potential ribozyme cleavage sites in human IRS-1 


that have least similarity to rat IRS-1. 


Cleavage Site Specificity Flanking Region 


Specificity Flanking Region 


euoessenes eo APOQOAGNEOD 
GHOOOESDHOH ©B, & Heit 


Helix 3 


Figure 1 


Ribozyme consensus sequence (derived from 5, 6 and 7). The 24 base-pair 
ribozyme (light circles) is surrounded by flanking regions of 10 nucleotides each 
lending cleavage specificity. The cleavage site shown here (dark circles) is 


between nucleotides 3053 and 3054 of rat IRS-1 mRNA. 


#ifndef _STDIO 
#include <stdio.h> 
#endif 


#ifndef _STRING 
#include <string.h> 
#endif 


#ifndef _CTYPE 
#include <ctype.h> 
#endif 


#ifndef _STDLIB 
#include <stdlib.h> 
#endif 


typedef struct { 
double fMatchScore; 
char fMatchedArea [32]; 
char fleft [12]; 
char fRight [12]; 
long fCcutSite; 
} saveRecord; 


saveRecord gSavedSites [10] ; 


void scan_str1( char * strl, char * str2, char * str_guc ); 

void scan_str2( char * ribo_bind, char * str2, double * result_match, char * 
matched_area ); 

void initSavedSites( ); 

void printSavedSites( ); 

void checkAndSaveSite( saveRecord toCheck ); 


void 
initSavedSites( ) 
{ 
int i; 
for{ i = 0; i < 10; i++ ) { 


gSavedSites[i].fMatchScore = 100.0; 
gSavedSites[i}.fMatchedArea[0] = 0; 
gSavedSites{i].fLeft(0] = 0; 
gSavedSites[i].fRight[0] = 0; 


Figure 2 
Source code to RIBOFIND. Program source is written in ANSI C and 


continues over the next 5 pages. 
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void 
printSavedSites( ) 
{ 


int i; 


printf( “\n\nBEST TARGET SITES:\n\n" ); 
for( i = 0; i < 10; i++) { 
printf( "\n+ Site $d: Cut between nucleotide %d and d\n", i+ 1, 
gSavedSites[i].fCutSite, gSavedSites[i].fCutSite + 1 ); 
printf({ " %s {GUC]* %s\n", gSavedSites[i].fLeft, gSavedSites[i].fRight ); 
printf( * Closest Match Score: $f Closest Matching Site: %s\n", 
gSavedSites[i] .fMatchScore, 
gSavedSites[i].fMatchedArea )}; 


} 
} 
void 
checkAndSaveSite{ saveRecord toCheck } 
{ 
int Ly 
if( toCheck.fMatchScore > gSavedSites[9].fMatchScore } 
return; 
for( i= 9; i >= 1; i-- ) 
if( toCheck.fMatchScore < gSaved@Sites[i-1].fMatchScore ) { 
gSavedSites[i] = gSavedSites[{i-1]; 
} else ( 
gSavedSites[i] = toCheck; 
break; 
} 
} 
if( i-==0) 
gSavedSites[i] = toCheck; 
} 
main () 
{ 
char * strl; 
char * str2; 
char * strSource; 
char * strDest; 
char * str_guc = "GUC"; 
char fileName [30]; 
FILE * theFile; 
int fileSize; 
char Cc 
int acceptMode; 
initSavedSites( ); 
do { 


printf( “Enter name of first sequence:\n" }; 
scanf( "%[*\n]%c", fileName ); 


theFile = fopen( fileName, “r" ); 
if( theFile == NULL ) { 


li 


printf( "The file \“%s\" was not found.\n", fileName }; 


} 
} while{ theFile == NULL }); 


fseek( theFile, 0, 2 }; 

fileSize = ftell( theFile }; 

fseek( theFile, 0, 0 ); 

strl = malloc( fileSize ); 

fread( stri, fileSize, 1, theFile }; 
felose({ theFile ); 


strSource = strl; 
strDest = strl; 
acceptMode = 0; 


while( (c = *strSource} != 0) { 
if( acceptMode == { 
if( c= '[' ) { 
acceptMode ++; 
} else { 

c = toupper( c ); 

if( c= 'T' ) { 
*strDest = 'U'; 
strDest ++; 

} else if( c == ‘A' [| c == 'c' [| c == ‘U' |{ ¢ == 
*strDest = Cc; 
strDest ++; 

} 

} 
} else { 
if(c=='J' ) { 
acceptMode --; 

if( acceptMode < 0 ) 

acceptMode == 0; 


J else if(c=='[' ) { 
acceptMode ++; 
} 
} 


strSource ++; 
} 
“strDest = 0; 


printf( "File \"%s\": size d\n", fileName, strlen{ strl ) ); 


do { 
printf( “Enter name of second sequence:\n” ); 
scanf( °$(*\n]%c", fileName ); 


theFile = fopen( fileName, “r“ ); 
if( theFile == NULL ) { 
printf( "The file \"%s\" was not found.\n", fileName ); 
} 
} while( theFile == NULL ); 
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} 


{ 


fseek( theFile, 0, 2 ); 

fileSize = ftell( theFile ); 

fseek( theFile, 0, 0); 

str2 = malloc( fileSize }; 

fread( str2, fileSize, 1, theFile ); 
fclose( theFile ); 


strSource = str2; 
strDest = str2; 
acceptMode = 0; 


while( (c = *strSource) != 0) { 
if({ acceptMode == 0) { 
if(c == '[' ) { 
acceptMode ++; 
} else { 

c = toupper( c ); 

if ( == 'T' ) { 
*strDest = 'U'; 
strDest ++; 

} else if( c == 'A' [| c == ‘c' || c¢ == 'U' J] co == 
*strDest = ¢; 
strDest ++; 

} 

} 
} else { 
if(c == 'J' ) { 
acceptMode --; 

if( acceptMode < 0 } 

acceptMode == 0; 
} else if( c == ‘[’ ) { 


acceptMode ++; 
} 
} 


strSource ++; 
} 
*strDest = 0; 


printf( "File \"%s\": size d\n", fileName, strlen( str2 j ); 


scan_stri( strl, str2, str_guc ); 
printSavedSites( ); 


| void 
scan_str1( char * strl, char * str2, char * str_guc ) 


char * strl_index; 

long cut_counter; 

char left_flank(12]; 
char right_flank[12]; 
long cut_site; 

long max_left; 

long max_right; 

char ribo_bind [32]; 
char matched_area [32]; 
double max_matched; 
saveRecord currentRecord; 
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for( cut_counter = 0, strl_index = stri; stri_index >= strl; stri_index += strlen( 
str_guc }, cut_counter ++ ) { 
strl_index = strstr( strl_index, str_guc }; 
if( stri_index != NULL ) { 
cut_site = strl_index - strl + strlen{ str_guc }; 
printf( “\ne Site %d: Cut between nucleotide %d and %d\n", cut_counter + 1, 
cut_site, cut_site + 1 ); 


max_left = cut_site - strlen( str_guc }; 
if( max_left > 10 ) 
max_left = 10; 


max_right = strlen( strl ) - cut_site; 
if( max_right > 10 ) 
max_right = 10; 


strncpy( left_flank, &stri[ cut_site - strlen( str_guc ) - max_left ], 


max_left ); 

left_flank[{max_left] = 0; 

strncepy( right_flank, &str1{ cut_site ], max_right ); 

right_flank[max_right] = 0; 

printf£( " %s [GUC]* %s\n", left_flank, right_flank ); 
i if( max_left < 6 || max_right < 6) { 

printf( " Ignoring site because only $d %s-flanking nucleotides are 

present\n", 


(max_left < 6) ? max_left : max_right, 
(max_left < 6) ? "left" : "right" ); 
continue; 


} 
sprintf( ribo_bind, “%sGUC%s", left_flank, right_flank ); 


scan_str2({ ribo_bind, str2, &max_matched, matched_area ); 

max_matched *= 100.0; 

printf( " Most closely matches %s (%3.£%% similarity) \n", matched_arca, 
max, matched ) ; 


currentRecord. fMatchScore = max_matched; 

strcpy( currentRecord. fMatchedArea, matched_area }; 
. strepy( currentRecord.fLeft, left_flank ); 

strepy( currentRecord.fRight, right_flank ); 

currentRecord. fCutSite = cut_site; 


checkAndSaveSite( currentRecord ); 


scan_str2( char * ribo_bind, char * str2, double * result_match, char * matched_area 


long number_match; 

char * current_str2; 

char * current_ribo; 

long start_index = 0; 

long max_number_match = 0; 
long max_index; 
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current _str2 = &str2[start_index}; 
while( *current_str2 != NULL ) { 


current_ribo = ribo_bind; 
mumber_match = 0; 


while( *current_str2 != NULL && *current_ribo != NULL} { 


if{ *current_ribo == *current_str2 ) 
number_match ++; 


current_ribo ++; 
current_str2 ++; 

} 

if( number_match > max_number_match ) { 
max_number_match = number_match; 
max_index = start_index; 


} 


start_index ++; 
current_str2 = &str2[start_index]; 


} 


strepy( matched_area, ribo_bind }; 
str2 = &str2[ max_index J; 
while( *matched_area != 0 && *str2 != 0) { 
if( *matched_area {= *str2 ) 
*matched_area = tolower( *str2 ); 
matched_area ++; 
str2 ++; 


} 


*result_match = (double) max_number_match / strlen( 


ribo_bind ); 
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